https://github.com/stephenrho/bayes-workshop
In R set working directory to bayes-workshop (or bayes-workshop-master)
Last updated: 2019-10-25
https://github.com/stephenrho/bayes-workshop
In R set working directory to bayes-workshop (or bayes-workshop-master)
We will cover:
brms package (with example analyses)Stan model language (if there's time)I've tried to include links throughout that will be useful when you come to run your own Bayesian analyses
It tells you what you want to know!
Given the data (and our prior knowledge),
brms)\[ p(\theta \mid y) = \frac{p(\theta)p(y \mid \theta)}{p(y)} \]
\(p(y)\) does not depend on the model parameters so we can omit it in favor of the unnormalized posterior
\[ p(\theta \mid y) \propto p(\theta)p(y \mid \theta) \]
The posterior is proportional to the prior times the likelihood
\[ L(\mathbf{\theta \mid y}) = \prod^i f(\theta \mid y_i) \]
In R:
# likelihood of data for mu = 5, and sd = 10 prod(dnorm(x = Y, mean = 5, sd = 10)) # or exp(sum(dnorm(Y, 5, 10, log = T)))
## [1] 1.820996e-79
optim to search for the parameters that maximize the above (or minimize the negative log likelihood)"Weakly informative prior should contain enough information to regularize: the idea is that the prior rules out unreasonable parameter values but is not so strong as to rule out values that might make sense"
Going back to our reading time example, say we have a fairly good idea of the reading rate (words per second) of the general population:
For the standard deviation of reading times (\(\sigma\)), we might be less certain:
We can specify the priors separately
But they determine a joint distribution
Another example, linear regression:
\[ y_i = \beta_0 + \beta_1x_i + \epsilon_i \]
\[ \epsilon_i \sim \mbox{Normal}(0, \sigma) \]
Or
\[ y_i \sim \mbox{Normal}(\beta_0 + \beta_1x_i, \sigma) \]
If \(x\) and \(y\) have been scaled (\(z\)-scored), a reasonable choice would be:
These priors essentially say that we expect either a positive or negative relationship between \(x\) and \(y\).
If we had strong reason to expect that \(y\) should increase with \(x\) we could instead use:
brms and Stan use the LKJ prior (after Lewandowski, Kurowicka, & Joe, 2009)Image from here
lme4 convergence can be an issuelme4 problem - the model is too complex to be supported by the dataMarkov Chain Monte Carlo
Goal of MCMC - to approximate a target distribution
Note: Warm up for brms/Stan is more complicated and serves to tune the sampling parameters
A way of estimating the number of independent samples once accounting for auto-correlation:
\[ ESS = \frac{N}{1 + 2\sum_{k = 1}^{\infty} \rho_k} \]
Where \(\rho_k\) is the auto-correlation at lag \(k\). Think of this as dividing the number of samples by the amount of auto-correlation. In practice the sum stops when the auto-correlation is small (e.g. \(\rho_k < 0.05\); see Kruschke, 2015, p. 184).
How do we know that we have converged onto a stable distribution?
For each step in the chain (or a random subset of the chain), simulate \(N\) observations from the model with parameters set to the current step in the chain (where \(N\) is the size of the original data set)
We can compare these simulated outcomes to the observed data
Both try to estimate the expected log predictive density (elpd) for new data
loo package (see Vehtari et al., 2017 for details)Larger elpd is better but note that LOO and WAIC are often reported on deviance scale (multiplied by -2), in which case smaller values indicate better fit.
brms examples), \(K\)-fold cross validation can be used\[ \frac{p(M_1 \mid y)}{p(M_2 \mid y)} = \frac{p(y \mid M_1)}{p(y \mid M_2)} \times \frac{p(M_1)}{p(M_2)} \]
\[ \frac{p(M_1 \mid y)}{p(M_2 \mid y)} = B_{1,2} \times \frac{p(M_1)}{p(M_2)} \]
\[ p(y \mid M) = \int p(y \mid \theta, M) p(\theta \mid M) d\theta \]
BayesFactor package (only normal models)You will see both of these around…
brms)HDInterval::hdi() to calculate)Region Of Practical Equivalence
Steps we'll follow in our brms examples:
brms, Stan, etc) and ensure chains have converged (we'll cover other possible warnings/errors specific to Stan)Further reading:
The next slides show examples of simulated data for the linear regression example (\(y_i \sim \mbox{Normal}(\beta_0 + \beta_1x_i, \sigma)\)) - with 100 observations of x=0 and 100 of x=1